[KVCache] Support only flush FD GPU Cache index by AttentionStore by jackyYang6 · Pull Request #7609 · PaddlePaddle/FastDeploy

jackyYang6 · 2026-04-24T06:03:26Z

Motivation

This PR improves the FD_AS_ONLY_FLUSH flow for AttentionStore so FastDeploy can flush KV cache index state when GPU cache blocks are evicted, especially in pure-GPU cache deployments without CPU cache.

It adds the required flush metadata to support more accurate AttentionStore index updates for GPU eviction.

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Extend WriteStorageTask with:
- flush_cache_exists to indicate whether cache still exists on the current node in FD_AS_ONLY_FLUSH mode.
- start_write_block_idx to support partial flush/write from a specified block index.
Update cache_transfer_manager AS-only flush path to call AttentionStore.flush_token_index(...) with both start_write_block_idx and reside_in_gpu.
Propagate FD_AS_ONLY_FLUSH to the cache transfer manager subprocess.
Update prefix_cache_manager.free_block_ids_async(...) to emit flush-only tasks when GPU cache blocks are directly evicted in FD_AS_ONLY_FLUSH + attention_store mode.
Add FD_AS_ONLY_FLUSH environment variable entry in fastdeploy/envs.py.
Add unit test coverage for GPU eviction flush behavior, including:
- flush_cache_exists=False
- empty gpu_block_ids in flush-only mode
- correct start_write_block_idx=depth-1

Usage or Command

For FD_AS_ONLY_FLUSH mode with AttentionStore:

export FD_AS_ONLY_FLUSH=1

Reference test command:

python3 -m pytest tests/cache_manager/test_prefix_cache_manager.py -q

Accuracy Tests

N/A. This PR does not change model forward results or kernel numerical behavior. It only updates KV cache index flush metadata and adds unit tests for cache manager behavior.

Checklist

Add at least a tag in the PR title.
- Suggested title: [KVCache] Support flush FD GPU/CPU Cache index by AttentionStore
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag. (N/A for current develop PR)

paddle-bot · 2026-04-24T06:03:32Z

Thanks for your contribution!

codecov-commenter · 2026-04-24T07:32:44Z

Codecov Report

❌ Patch coverage is 51.51515% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@4c8f7df). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/cache_manager/cache_transfer_manager.py	17.64%	13 Missing and 1 partial ⚠️
fastdeploy/cache_manager/prefix_cache_manager.py	84.61%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7609   +/-   ##
==========================================
  Coverage           ?   71.66%           
==========================================
  Files              ?      419           
  Lines              ?    57885           
  Branches           ?     9085           
==========================================
  Hits               ?    41485           
  Misses             ?    13569           
  Partials           ?     2831

Flag	Coverage Δ
GPU	`71.66% <51.51%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-04-28 00:33:47

📋 Review 摘要

PR 概述：在 FD_AS_ONLY_FLUSH 模式下支持 GPU Cache 块被驱逐时通过 AttentionStore 刷新 KV 缓存索引，跳过实际数据写入，仅更新索引状态。

变更范围：fastdeploy/cache_manager/（cache_tasks、cache_transfer_manager、prefix_cache_manager）、fastdeploy/envs.py、tests/cache_manager/

影响面 Tag：[KVCache]

📝 PR 规范检查

标题 [KVCache] Support only flush FD GPU Cache index by AttentionStore 含有效官方 Tag，描述结构完整（Motivation / Modifications / Usage / Accuracy Tests / Checklist 均已填写），整体合规。✓

问题

级别	文件	概述
🟡 建议	`fastdeploy/cache_manager/cache_transfer_manager.py:989`	`write_back_storage_task` 对 `FD_AS_ONLY_FLUSH` 缺少 backend 类型守卫，非 attention_store 后端下会静默跳过所有写操作
❓ 疑问	`fastdeploy/cache_manager/prefix_cache_manager.py:1452`	`hash_value_flush_info` 只保留 `min_depth` 节点的 `token_ids`，需确认其长度能覆盖更深层被驱逐的 block
❓ 疑问	`fastdeploy/cache_manager/prefix_cache_manager.py:1251`	`is_sync=False` 时 `_flush_only_storage_task` 子进程仍发送 `put_transfer_done_signal`，请确认消费侧无孤立信号积压风险

总体评价

整体设计思路清晰，AS-only flush 路径实现合理，单测覆盖了核心 eviction 场景。主要需关注 write_back_storage_task 中缺少 backend 类型守卫的潜在风险，以及 token_ids 覆盖范围和孤立信号两个边界问题，建议作者确认后合入。

PaddlePaddle-bot · 2026-04-27T16:43:33Z

            self.storage_backend
        ), f"storage_backend not initialized, storage_backend_type: {self.storage_backend_type}"

+        if envs.FD_AS_ONLY_FLUSH:


🟡 建议 write_back_storage_task 中对 FD_AS_ONLY_FLUSH 的检查缺少 storage_backend_type 过滤

当前代码在 FD_AS_ONLY_FLUSH=True 时无条件 early return 到 _flush_only_storage_task，但该函数内部只在 storage_backend_type == "attention_store" 时执行真正的 flush，其他 backend 下整个 try 块是 no-op。这意味着若用户在 non-attention_store 后端下误开 FD_AS_ONLY_FLUSH=1，所有写操作都会被静默跳过，缓存永久丢失且无报错。

建议在此处加 backend 类型检查，或在 _flush_only_storage_task 中对非 attention_store backend 显式抛出异常：

if envs.FD_AS_ONLY_FLUSH: if self.storage_backend_type != "attention_store": raise ValueError( f"FD_AS_ONLY_FLUSH is only supported with attention_store backend, " f"but got: {self.storage_backend_type}" ) return self._flush_only_storage_task(task)

PaddlePaddle-bot · 2026-04-27T16:43:33Z

                    self.gpu_lru_leaf_set.remove(node)
                    if self.cache_config.num_cpu_blocks < need_block_num:
                        if node.shared_count == 0 and node.is_gpu_leaf_node:  # 直接回收
+                            if envs.FD_AS_ONLY_FLUSH and self.kvcache_storage_backend == "attention_store":


❓ 疑问 hash_value_flush_info 只保留 min_depth 节点，token_ids 取自最浅节点

当同一 input_hash_value 下有多个不同深度的节点被批量驱逐（如 depth=2,3,4 都命中）时，这里只保留 min_depth 节点的 token_ids，最终只发送一个 flush task：start_write_block_idx=min_depth-1。

已查阅 attention_store.flush_token_index 实现，确认其语义是「从 start_block_idx 到末尾的所有 block 状态都更新」，所以一次 flush 可以覆盖从最浅节点到叶节点的全部范围。此逻辑正确。

但有一个边界问题需要作者确认：最浅节点的 input_ids（token_ids）是否包含足够长的序列，使 SDK 能正确定位到更深层的 block？ 若 input_ids 只编码到 min_depth 对应的 block，SDK 可能无法覆盖更深的驱逐范围。

PaddlePaddle-bot · 2026-04-27T16:43:33Z

            raise ValueError(err_msg)

-        self.task_write_back_event[task.task_id] = Event()
+        if is_sync:


❓ 疑问 is_sync=False 时未创建 Event，但 _flush_only_storage_task 子进程仍会调用 put_transfer_done_signal

free_block_ids_async 中通过 issue_write_back_storage_task(flush_task, is_sync=False) 触发 flush，此时主进程不创建 task_write_back_event，也不等待完成。但子进程的 _flush_only_storage_task 在执行后仍会调用 put_transfer_done_signal(result)，这个信号在主进程端找不到对应的 Event 接收者，会被静默忽略。

在 GPU 驱逐频繁时（如大量 prefix 命中后的 eviction）可能积累较多孤立信号。请确认 put_transfer_done_signal 的消费侧逻辑在找不到对应 task_id 时是否完全安全（无内存泄漏、无锁死）。

jackyYang6 temporarily deployed to Metax_ci April 24, 2026 06:03 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

jackyYang6 changed the title ~~[KVCache] Support only flush FD GPU/CPU Cache index by AttentionStore~~ [KVCache] Support only flush FD GPU Cache index by AttentionStore Apr 27, 2026

jackyYang6 force-pushed the kvcache/as_only_flush branch from 22dab4e to 429ed50 Compare April 27, 2026 16:20

jackyYang6 had a problem deploying to Metax_ci April 27, 2026 16:20 — with GitHub Actions Error

This comment was marked as outdated.

Sign in to view

[KVCache] Support flush FD GPU/CPU Cache index by AttentionStore

6ead139

jackyYang6 force-pushed the kvcache/as_only_flush branch from 429ed50 to 6ead139 Compare April 27, 2026 16:29

jackyYang6 temporarily deployed to Metax_ci April 27, 2026 16:30 — with GitHub Actions Inactive

PaddlePaddle-bot reviewed Apr 27, 2026

View reviewed changes

Jiang-Jia-Jun added cherry-pick: release/2.6 release/2.6 and removed cherry-pick: release/2.6 labels Apr 28, 2026

Jiang-Jia-Jun merged commit d92cad9 into PaddlePaddle:develop Apr 28, 2026
36 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCache] Support only flush FD GPU Cache index by AttentionStore#7609

[KVCache] Support only flush FD GPU Cache index by AttentionStore#7609
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:developfrom
jackyYang6:kvcache/as_only_flush

jackyYang6 commented Apr 24, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented Apr 24, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 24, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 27, 2026

Uh oh!

PaddlePaddle-bot Apr 27, 2026

Uh oh!

PaddlePaddle-bot Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jackyYang6 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 24, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jackyYang6 commented Apr 24, 2026 •

edited

Loading

codecov-commenter commented Apr 24, 2026 •

edited

Loading